Ford Go Bike Communication

In this project I performed an exploratory analysis on data provided by Ford GoBike, a bike-share system provider, using Python visualization techniques. The goal is to figure out what variables possess the most influential power on a bike sharing service. I did the analysis on 2019 year data

Preliminary Wrangling

Loading the dataset

Cleaning

  1. Columns that have missing values:
    • start_station_id
    • start_station_name
    • end_station_id
    • end_station_name
  1. Convert start_time and end_time to timestamp format.
  1. For further analysis start_time and end_time need to be separated into hour, day and month columns.
  1. Some columns need to be changed into another type
    • user_type
    • start_station_id
    • end_station_id
    • duration_mins
    • bike_id
  1. Drop unwanted columns for analysis.
  1. Deal with columns that have missing values
  1. Convert start_time and end_time to datetime format.
  2. For further analysis start_time and end_time need to be separated into hour, day and month columns.
  1. Some columns need to be changed into another type
    • bike_id to object string
    • start_station_id to object string
    • end_station_id to object string
    • duration_mins to int64
    • user_type to category
  1. Drop unwanted columns for analysis.

Dataset structure

What is the structure of your dataset?

Columns:

What is/are the main feature(s) of interest in your dataset?

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

Univariate Exploration

Distribution of Rental duration in minutes

Distribution of Customer vs Subscribers

Day, Month and Hour vs Rentals

How many enrolled in bike sharing scheme ?

Distance travelled by Subscribers and Customers

Percentage of Rental duration for Subscribers and Customers

Bivariate Exploration

Distribution of rental duration in minutes (subscribers)

Distribution of rental duration in minutes (customers)

Distribution of rental duration in minutes (all users)

Distance travelled in miles by Subscribers

Duration in mins of Rentals for Subscribers and Customers

Stations with the most customers rentals

Stations with the most subscribers rentals

Stations with the most bike share rentals

Multivariate Exploration

Distribution of Start Day and Start Hour for Customers

Heat map of Time vs Subscribers Rentals.

Locations

Top 40 locations of rentals for Customers

Top 40 locations of rentals for Subscribers

Top 40 locations of rentals for Bike Share for all users

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Were there any interesting or surprising interactions between features?

Summary